Illinois CCG TAC 2015 Event Nugget, Entity Discovery and Linking, and Slot Filler Validation Systems

نویسندگان

  • Mark Sammons
  • Haoruo Peng
  • Shyam Upadhyay
  • Pavankumar Reddy
  • Subhro Roy
  • Dan Roth
چکیده

This paper describes University of Illinois’s Cognitive Computation Group (UI-CCG)’s submissions for three TAC tracks: Event Nugget Detection/Coreference; Entity Discovery and Linking (EDL); and Slot Filler Validation (SFV). The Event Nugget Detection and Coreference system employs a supervised model for event nugget detection with rich lexical and semantic features while we experiment with both supervised and unsupervised event co-reference methods. We also utilize ACE2005 data as an additional source and use several domain adaptation techniques to improve our system’s performance. The Entity Discovery and Linking system focuses on solving the Spanish subtask. The system uses Google Translation to translate Spanish documents into English and then apply Illinois Wikifier to identify entity mentions and disambiguate them to Wikipedia entries. It outperforms other participants on both linking and clusteringevaluations. The Illinois SFV system treats the task as an entailment problem, seeking to identify for each individual query whether or not the proposed answer is valid based on the information contained in the query document. The system builds on those of previous years, and uses a machine learning component to try to extract cues from unmarked relations in the context of the query relation. The three systems described here were developed as separate systems. 1 Event Nugget Detection and Co-reference In this section, we describe our submission to the TAC KBP event task. Our team participated in the TAC KBP Event Nugget (EN) track. It includes three sub-tasks: event nugget detection, event co-reference based on gold and predicted event nuggets. Our system uses a supervised model for event nugget detection with rich lexical and semantic features. For event co-reference model, we experiment with both supervised and unsupervised methods. For supervised models, we train a classifier to model the similarity between each event nugget pair while we also ESA representations (Gabrilovich and Markovitch, 2007; Song and Roth, 2015) to compute this similarity in an unsupervised fashion. We show each module of our system in the following sub-sections and discuss several techniques that we employ. 1.1 Event Nugget Detection We use a stage wise classification approach to extract all events (Ahn, 2006; Chen and Ng, 2012). We first train a 34-class classifier (33 event subtypes and one non-event class) to detect event nuggets and classify them into different types. We apply it on each token. Features for this supervised classifier includes lexical features, features from parser, Named Entity Recognition (NER), Semantic Role Labelling (SRL), entity co-reference and WordNet, and other semantic features from Explicit Semantic Analysis (ESA) (Gabrilovich and Markovitch, 2005; Gabrilovich and Markovitch, 2007) and Brown Clusters (Brown et al., 1992). We then apply a classifer using the same set of rich features on each detected event nuggets to get REALIS information (ACTUAL, GENERIC or OTHER). Features They can be summarized in the following categories: 1. Lexical features: context (part-of-speech tag and lemma) of a candidate token in a window size of 5 and 20, plus their conjunctions. 2. Seed features: we use 140 seeds for event triggers following a previous work (Bronstein et al., 2015). We consider whether a candidate token is a seed or not (also its type if it matches) and conjunction of the matched seed and context seeds (also their types). 3. Parse Tree features: path from a candidate token to root, number of its right/left siblings and their categories, and paths connecting a candidate token with other seeds or named entities. 4. NER features: named entities and their types within a window size of 20 of a candidate token. 5. SRL features: whether a candidate token is VerbSRL/NomSRL predicate and its role, its conjunction with SRL relation names and the conjunction of the SRL relation name and the NER types in the context. 6. Coref features: co-referred entities with the candidate token, and their conjunction with both the candidate token and named entities in the context. 7. ESA features: top 50 ESA concepts for each candidate token. 8. Brown cluster features: brown cluster vector of prefix length 4, 6, 10 and 20. 9. Wordnet features: hypernym, hyponym, entailment words and derived words of both the candidate token and its context, and also the wordnet relations between the candidate token and seed words. 10. Other features: whether a candidate token is in Framenet/Propbank or is a deverbal noun. Learning Model We choose Support Vector Machine (SVM) to train both event nugget detection classifier and realis classifier. We use L2 loss and set C as 0.1 after tuning on a development set. We use Illinois NLP packages for NER1, SRL2, and http://cogcomp.cs.illinois.edu/page/ software_view/NETagger http://cogcomp.cs.illinois.edu/page/ software_view/SRL Entity Co-reference3. Domain Adaptation Apart from the KBP training data, we use ACE2005 as an additional source of our training data. The ACE event taxonomy is similar to that of the KBP task. To enable the domain adaptation from ACE to KBP, we employ the following techniques. 1. We view event triggers in ACE annotations as event nuggets in the KBP task. 2. We apply a deterministic rule to convert ACE realis information to KBP formulation. Specifically, we combine “Genericity.Past” and “Tense.Past” in ACE to be “Actual” in KBP. We also use“Genericity.Generic” directly as ”Generic” and “Tense.Unspecified” (and sometimes also ”Tense.Future”) as ”Others”. 3. As ACE and KBP have different data distributions based on event types, we use resampling in ACE to match the event nugget type distribution in KBP. There is also notable mismatch of the density of events between ACE and KBP. On average, each sentence contains 0.34 events in ACE, while in KBP, the statistics is 0.82, which is significantly larger. Thus, we also use subsampling to get a subset of the negative training examples in ACE to have a similar positive and negative training example ratio in KBP. Results We have two development datasets, one from ACE2005 dataset while the other is from KBP data. On ACE2005, we select 40 documents from newswire articles for testing and the rest for training. We only use this ACE development set to evaluate our performance on ACE. For KBP data, we also select 30 documents (20% of the available data) as the developmemt set. These selected documents contain genres of both news articles and discussion forums. We use this KBP development set to test performance on models trained on KBP data and ACE-KBP combined data using domain adaptation techniques. Results on the two development sets are shown in Table 1. The overall score on the KBP development set shows that it is best to train on ACE-KBP combined data without doing resampling and subsampling techniques. However, the sampling technique improves recall http://cogcomp.cs.illinois.edu/page/

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Overview of UI-CCG Systems for Event Argument Extraction, Entity Discovery and Linking, and Slot Filler Validation

In this paper, we describe the University of Illinois (UI CCG) submission to the 2013 TAC KBP Event Argument Extraction (EAE), English Entity Discovery and Linking (EDL), and Slot Filler Validation (SFV) tasks. We developed three separate systems. Our Event Argument Recognition system infers world knowledge from event argument overlaps to improve performance of a recognition/labeling pipeline. ...

متن کامل

Illinois Cognitive Computation Group UI-CCG TAC 2013 Entity Linking and Slot Filler Validation Systems

In this paper, we describe the University of Illinois (UI CCG) submission to the 2013 TAC KBP English Entity Linking (EL) and Slot Filler Validation (SFV) tasks. We developed two separate systems. Our Entity Linking system integrates an improved version of the Illinois Wikifier with additional functionality to identify and cluster entity mentions that do not correspond to entries in the referen...

متن کامل

HITS at TAC KBP 2015: Entity Discovery and Linking, and Event Nugget Detection

HITS participated in the English Entity Linking and Discovery (EDL) and Event Nugget Detection (EN task 1) tracks at TAC KBP 2015. Our EDL system introduces a novel, interleaved multitasking approach, which allows interaction between interdependent entity linking subtasks while avoiding the structural and algorithmic complexity of joint models. Out of the eight systems that participated in the ...

متن کامل

RPI-BLENDER TAC-KBP2013 Knowledge Base Population System

This year the RPI-BLENDER team participated in the following four tasks: English Entity Linking, Regular Slot Filling, Temporal Slot Filling and Slot Filling Validation. The major improvement was made for Regular Slot Filling and Slot Filling validation. We developed a fresh system for both tasks. Our approach embraces detailed linguistic analysis and knowledge discovery, and advanced knowledge...

متن کامل

CUNY BLENDER TAC-KBP2012 Entity Linking System and Slot Filling Validation System

This year the CUNY-BLENDER team participated in the English Entity Linking and Slot Filling Validation tracks. for entity linking, we apply two new techniques, collaborative clustering and query reformulation. For answer validation, we use a logistic regression model trained on within-system and crosssystem features to re-rank the merged answer sets generated by individual systems. In this pape...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015